Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources
نویسنده
چکیده
This document outlines the needs and requirements for a protocol to control distributed speech processing of audio streams. By speech processing, this document specifically means automatic speech recognition (ASR), speaker recognition -which includes both speaker identification (SI) and speaker verification (SV) -and text-to-speech (TTS). Other IETF protocols, such as SIP and Real Time Streaming Protocol (RTSP), address rendezvous and control for generalized media streams. However, speech processing presents additional requirements that none of the extant IETF protocols address.
منابع مشابه
RFC 4313 Speech Services Control Requirements
This document outlines the needs and requirements for a protocol to control distributed speech processing of audio streams. By speech processing, this document specifically means automatic speech recognition (ASR), speaker recognition -which includes both speaker identification (SI) and speaker verification (SV) -and text-to-speech (TTS). Other IETF protocols, such as SIP and Real Time Streamin...
متن کاملMachine Speech Chain with One-shot Speaker Adaptation
In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components to mutually improve their performance. This was accomplished by the two parts teaching each other using both labeled and unlabeled data. This approach could significantly improve model perfo...
متن کاملPersonalized, Cross-Lingual TTS Using Phonetic Posteriorgrams
We present a novel approach that enables a target speaker (e.g. monolingual Chinese speaker) to speak a new language (e.g. English) based on arbitrary textual input. Our system includes a trained English speaker-independent automatic speech recognition (SI-ASR) engine using TIMIT. Given the target speaker’s speech in a non-target language, we generate Phonetic PosteriorGrams (PPGs) with the SI-...
متن کاملSynthesis using Speaker Adaptation from Speech Recognition DB
This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthesis system (HTS). More than 150 Catalan synthetic voices were built using Hidden Markov Models (HMM) and speaker adaptation techniques. Training data for building a Speaker-Independent (SI) model were selected from both a general purpose speech synthesis database (FestCat;) and a database designe...
متن کاملEvaluation of the Vulnerability of Speaker Verification to Synthetic Speech
In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis have renewed interest in this problem. We use a HMM-based speech synthesizer, which creates synthetic speech for a targeted speaker through adaptation of a background model and a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- RFC
دوره 4313 شماره
صفحات -
تاریخ انتشار 2005